viona worker LWPs should not be CPU-pinned#1146
Conversation
VNA_IOC_RING_INIT* includes, as part of init'ing a ring, creating an LWP in the calling process for that ring's worker thread. This is all fine and good, but for that LWP inheriting the binding of the original ioctl-caller. The ioctl-caller, then, is functionally always a vCPU LWP, in service of an MMIO exit by a guest who is trying to start their virtio NIC. Since one vCPU is probably handling a routine that loops over all desired virtio rings and enables them, the effective outcome is that all viona rings are handled on some sole arbitrary host CPU that is also responsible for a vCPU. Avoid all of this by temporarily unbinding whatever LWP is getting the viona ring going, then rebinding before we've continued on.
|
I haven't been able to get this on a racklette quite yet but I've tested the effect of this change by starting a large (68 vCPU) VM on dogfood, setting restarting the iperf3 client saw a steady 31gbit/sec. I don't think any of this compares in an expected way with other benchmarks I know we've done, but I'm plenty convinced this is worth checking out in a more rigorous way.. |
| let newbind: processorid_t = bind_cpu.unwrap_or(PBIND_NONE); | ||
| let mut obind: processorid_t = PBIND_NONE; |
There was a problem hiding this comment.
hm, what are the explicit processorid_t type annotations doing here? seems pretty clearly inferrable. is this just due to being sketched out by the cast in the unsafe block?
There was a problem hiding this comment.
yeah, in particular &mut obind as *mut processorid_t feels like an easy way to type confusion so I'm being a bit extra on the type annotations. and newbind got one because it felt weird to only type obind when they're the same.
| /// If the function panics, the LWP's original processor binding will not be | ||
| /// restored. |
There was a problem hiding this comment.
it seems this could be pretty easily achieved with a drop guard, but maybe it's not worth it?
There was a problem hiding this comment.
since propolis-server/propolis-standalone are panic=abort i kinda don't care too much, but also even with a drop guard there's no guarantee that runs either right?
| // Arguably one might not want to do such operations directly on a | ||
| // vCPU thread. Device setup isn't exactly on anyone's hot path so | ||
| // we'll live. |
There was a problem hiding this comment.
hm, what are you imagining here? just spawning off a short-lived thread to do one ioctl, or...?
There was a problem hiding this comment.
maybe, or a separate thread to handle "control plane" operations on a device that these register reads/writes interact with by a channel. a lot of operations are allowed to be async, and we run then synchronously because it's simpler and decoupling the register access from taking a lock and bumping a field would be silly. but this is heavy enough you can see the thinking maybe?
|
having checked out that with this the viona tx/rx LWPs aren't unexpectedly bound to some CPU after this change, I'm going to call this "clearly better than the status quo" - from here the only question to me is "is this worth mentioning in release notes once it's in", so ... lets get it in. |
VNA_IOC_RING_INIT* includes, as part of init'ing a ring, creating an LWP in the calling process for that ring's worker thread. This is all fine and good, but for that LWP inheriting the binding of the original ioctl-caller. The ioctl-caller, then, is functionally always a vCPU LWP, in service of an MMIO exit by a guest who is trying to start their virtio NIC.
Since one vCPU is probably handling a routine that loops over all desired virtio rings and enables them, the effective outcome is that all viona rings are handled on some sole arbitrary host CPU that is also responsible for a vCPU.
Avoid all of this by temporarily unbinding whatever LWP is getting the viona ring going, then rebinding before we've continued on.